Effective potentials for Folding Proteins 
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A coarse-grained off-lattice model that is not biased in any way to the native state is proposed 
to fold proteins. To predict the native structure in a reasonable time, the model has included the 
essential effects of water in an effective potential. Two new ingredients, the dipole-dipole interaction 
and the local hydrophobic interaction, are introduced and are shown to be as crucial as the hydrogen 
bonding. The model allows successful folding of the wild-type sequence of protein G and may have 
provided important hints to the study of protein folding. 

PACS numbers: 87.15.Aa, 36.20.-r, 87.14.Ee 



The problem of predicting the native structure of a 
protein for a given sequence has been of great interest 
due to its relevancy to many fields in biology. In the 
crudest level, lattice models are proposed and have pro- 
vided important insights 0, ; however, due to the over- 
simplification, they are far from real applications. On 
the other hand, all-atom simulations deliver more details 
for the folding process, but the requirement of computa- 
tional resources tends to be realistically unaffordable Q . 
Developing models of coarse graining thus becomes the 
next step. For this purpose, off-lattice models Q using 
Go-type potentials have been used to explore the fold- 
ing dynamics. Since the relevant interactions are based 
on native structures, the Go-type potentials can not be 
used to predict structures. There are also models that 
succeeded in separately folding helix bundles or folding 
beta hairpins |fj . Nevertheless, the interacting potentials 
employed are also biased towards the native states. So 
far, there is no model that can fold proteins using re- 
alistic potentials, it is therefore desirable to construct a 
coarse-grained model that can fold proteins without be- 
ing biased in any way to the native state. 

In this paper, based on microscopic considerations, we 
propose a coarse-grained model with realistic potentials. 
The model has been tested successfully on more than 16 
small proteins, of sizes from 12 to 56 amino acids 0. 
For most examples, even without particularly optimizing 
our code, the computing time is reasonably short and is 
within the order of hours on ordinary desktop computers. 
Here, instead of exploring its predicting ability, we shall 
be focusing on only one protein (one of the protein G 
families with PDB ID : 1GB4) to illustrate the folding 
mechanism embedded in the proposed model. A brief 
summary of other important proteins is given in • 

In our model, side-chains are coarse-grained as spheres 
but explicit structures are kept in backbones 8]. On 
the other hand, water molecules are not included explic- 
itly but their effects are incorporated in effective poten- 
tials among side-chains and backbones. The hydropho- 



bic (HP) interaction has been known as the most impor- 
tant effect due to water. Recently, it is realized that the 
length-scale of water molecules has to be kept at short 
distances to prevent proteins collapsing prematurely |lfj| . 
Therefore, the desolvation model hd] combined with the 
Miyazawa-Jernigan (MJ) matrix 11] is employed to de- 
scribe the interaction among the side chains. Further- 
more, since the MJ potential is a non-neighboring inter- 
action, its extension to include nearest neighbors (n.n.) 
along the sequence is needed. Similar to the spirit of the 
HP model [2|, a local hydrophobic potential, VhocaiHP, is 
implemented by assigning potential energies to any suc- 
cessive pairs of amino acids according to their hydropho- 
bicity. On the other hand, the hydrogen bonding (HB) 
has long been thought as the key molecular interaction 
0. However, for small proteins, it is known that HB 
prefers the helix structure over the beta sheet because 
the former has a larger number of HBs. Thus it hints 
to include a second molecular interaction. Indeed, anal- 
ysis on the MJ matrix indicates that the electric dipole- 
dipole interaction dominates in the pair-wise interaction 
among side chains |12j. Microscopically, there is also 
charge imbalance in the CO-NH group on the amide 
plane with the magnitude of the dipole being estimated 
to be p — 1.15 x 10~ 19 Cm. Simple analyses reveal that 
the directions of these dipoles have strong correlation with 
the secondary structure |l.1| : In the alpha helix, succes- 
sive dipoles on the backbone tend to be in parallel; while 
in the beta sheet, they tend to change directions alter- 
nately (see Fig. 1 for example). In order to capture rele- 
vant energetics, we explicitly introduce the dipole-dipole 
interaction Vdd among the backbone elements. The po- 
tentials VhocaiHP and Vdd are the main ingredients that 
make our model different from early models. Remark- 
ably, our simulations indicate that these two interactions 
and the hydrogen bonding form the key interactions for 
determining the secondary structure. Specifically, we find 
that while the hydrogen bonding is essential to the for- 
mation of the alpha helix, to fold the beta sheet, both 
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Vdd and VLocaiHP are indispensable. 

The potential is constructed in a renormalized fash- 
ion: Except for global multiplicative scales (denoted by 
e a in the following, with a representing different contri- 
butions), interactions (such as Vdd and Vmj, see below) 
at large distances take the usual form; while for inter- 
actions (such as VhocaiHP and Vnd, see below) at suc- 
cessive neighbors (short distances) , since the variation of 
distance is unimportant, only angle variables are kept. 
The parameters employed in the potentials are adopted 
from experimental data @, while the scales e^'s are cal- 
ibrated based on a few proteins of known structures Q . 

The degrees of freedom for backbones are two Ra- 
machandran angles <j> and ip 14]. Since the peptide bond 
on any amide plane is partially double-bonded, the angle 
u> around the peptide bond is fixed to be 180° so that 
it corresponds to the trans conformation. The spheres 
that represent side-chains are centered at C^ and are at- 
tached to C"-atoms rigidly, and different effective radii 
are assigned in consistent with the geometric structures 
|l5| . In these representations and with all energies being 
in unit of kcal/mol, the potential can be written as 



Vtotal — Vst 



-VHB + V DD + VM,J + V Loca iHP + V A . (1) 



Here V ste ric enforces structural constraints such as hard- 
core potentials to avoid unphysical contacts. Vhb ac- 
counts for the hydrogen bonding between any non- 
neighboring NH (labeled by i) and CO (j) pair and is im- 
plemented as Vhb = £hb J2 n ,i,j u (nj)v(0 n ,ij), where r i3 
is the distance between Hi and Oj and u(r) is the stan- 
dard 12-10 Lennard- Jones potential with the equilibrium 
distance being set to the the averaged experimental value 
1.738 A 8J. The angle function v imposes the directional 
nature of HB, parameterized by three angles (n = 1, 2, 3): 
7T - ZC l O i H j , dOi A NjH 3 , and tt - /(),//,. V,. Their 
values are confined to the averaged experimental data 
H respectively: 26.77°, 11.60°, and 17.98°. To increase 
the efficiency of HB formation, certain uncertainty A8 is 
allowed. Empirically, AO = 60° is most efficient. 

The dipole term Vdd at large distances takes the or- 
dinary form 



Vdg — £dg ^ 



Pi- Pi 3 (ft ■ fij) (pj ■ nj) 



r?. 
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(2) 



where pi and pj are dipoles of either CO or NH, and the 
summation excludes successive dipoles. When dipoles are 
in successive neighbors, it is given by 



Vdn — £dn 



Pi ■ Pi+i 
PiPi+i 



- 1 



(3) 



Vmj is the extension of the MJ matrix with the form 
V M .i = cmj Y,i,jl v LJ ( r i 3 ) + Vgi (m) + V G2 (nj)]. Here 
Vlj is the MJ matrix element multiplied by the 



usual 12-6 Lennard-Jones potential with the equilib- 
rium distance being the sum of radii of two side-chains. 
Vgi + Vq2 represents the potential obtained numeri- 
cally in the desolvation model |l0j . For numerical pur- 
pose, however, we find that it is more convenient to 
use the following approximately analytic forms: Vqi = 



ei x exp 



x (nj -r b y 



is a Gaussian fit to the de- 



solvation barrier with r& being the position of desolva- 
tion barrier and a w being the size of the water molecule; 



while V G i = £2 x exp 



x (n. 



is an inverted 



Gaussian fit to the metastable minimum at r w due to 
water molecules. Here for the best fit, t\ ~ 5|ey|/9 and 
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The potential Vl 
side-chains 

V L 



iiHP acts only on successive pairs of 



ocalHP 
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Here qi represents the hydrophobicity or the charge state 
of the zth side-chain. Following Ref.^3| , qi are classi- 
fied into hydrophobic (H), polar(P), neutral(N), positive 
charged(+), and negative charged(— ). In this classifica- 
tion, N is regarded as a referential type such that when- 
ever qi — N or qi + i = N , V quqi+l = 0. Furthermore, 
when charged side-chains encounter other non-charged 
ones, they are considered as polar. Therefore, the only 
nontrivial potential energies are (Vhh-i Vpp, V-\ ) (at- 
tractive) and (Vhp,V ++ ) (repulsive). To implement the 
hydrophobic effects, an attractive pair acquires a neg- 



ative energy — e. 



1 when their C a C" lines are par- 
allel to each other, and when in other orientation, no 
energy is assigned; while for repulsive pairs, a negative 
energy — e gil9i+1 is assigned when their C a C" lines are 
anti-parallel. In practice, a smooth function is used to 
interpolate between finite Vq it q i+1 and zero. Finally, Va 
is an on-site potential in proportion to the area of each 
side-chain that is exposed to water. The proportional 
constant is en — (en) with i being the index for the side- 
chain. The existence of Va has already been found in 
the analysis of the MJ matrix ^3] and it helps to further 
contrast the hydrophobicity of each side-chain. 

The Monte Carlo method is employed to fold proteins. 
After careful calibration Q , the global scales are found to 
be e H B ~ 4.8, e D g ~ 0.2, e D N « 2.1, e M j ~ 0.2, and for 
VhocaiHP, £hh = £pp = £hp ~ 5.0, e ++ = — w 5.0. 
The same scales are adopted to simulate the protein 
1GB4, which is a wild-type protein with one alpha he- 
lix and two beta hairpins. Fig. 1 shows its spatial ar- 
rangement and corresponding dipole arrangement of our 
simulated energy ground state, while Fig. 2 shows the 
contact map. The native contact number ratio (Q) for 
simulated ground state is 0.6, while the RMSD is 2.97 
A. Clearly, our simulation is in good agreement with the 
experiment while the computing time is only a few hours 
on a P4-3.0GHz PC. Note that the ground state energy is 



FIG. 1: Schematic plot of the simulated protein G - 1GB4: 
native conformation and corresponding dipole configuration. 
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FIG. 2: Contact map of energy ground states (native struc- 
tures) for 1GB4 at ksT = 0.8 kcal/mol with blue square being 
our simulation result (Q = 0.6) and red square being the data 
from PDB. 

-545 kcal/mol and the nearest local minimum is about 34 
kcal/mol higher in energy. Furthermore, both the helix 
and the beta sheet are formed only when correct scales e's 
and appropriate temperature are employed. The porta- 
bility of these scales (and our model) to other proteins are 
tested in 15 proteins. The results are briefly summarized 
in Ref. 7]. Our results are generally in good agreement 
with experiments with the tolerance of e's being about 
0.5. Occasionally, the accuracy is not good. However, 
in that case, the cause is due to the metal ion not being 
included in our simulation 0. 

To clarify the roles of Vdd and VhocaiHP, the alpha 
helix (A24 to D37) and the beta hairpin with C terminus 
(G42 to E57) are extracted. The energy versus Q along 



FIG. 3: Effects of different strengths of Vdd on the formation 
of the alpha helix (o) and the beta hairpin with C terminus 
(b). The corresponding strengths: solid (black) - Vdd, circle 
(red) - 0.5Vdd, and triangle (green) - 0. Q is the native 
contact number ratio with Q = 1 corresponding to the native 
conformation. 

the folding is then monitored for different strengths of the 
potentials. Figs. 3(a) and 3(b) show the effect of Vdd for 
three different strengths. Clearly, we see that the native 
conformation (Q = 1) stays at the minimum for the helix, 
while for the beta hairpin, it gradually moves away from 
the minimum. When Vdd is completely turned off, the 
beta sheet is no longer the ground state. Similar analyses 
are done by tuning V Loca iHP as shown in Figs. 4(a) and 
4(b). We see that although affecting the formation of the 
alpha helix, both V L ocaiHP and Vdd have stronger effects 
on the formation of the beta sheet and can change the 
ground state completely. Similar behaviors also occur for 
the beta hairpin with N terminus and other 15 proteins. 
For 1GB4, if we turn off VhocaiHP and Vdd, only seg- 
ments of helices are formed. Therefore, both Vdd and 
VlocuIHP are responsible for the formation of the beta 
sheet. 

It should be noted that the above analyses are done 
with fixed Vhb, and the vanishing alpha helix in Fig. 4(a) 
can be restablized by increasing Vhb- However, similar 
restablization does not occur to the beta sheet due to the 
fact that the helix has more HBs. Therefore, when Vhb 
is large enough, the helix conformation always wins, and 
even a beta sheet will be turned into a helix. On the other 
hand, because successive dipoles in a helix tend to have 
unfavorable parallel orientations, sufficient strong Vdd 
can stabilize the beta sheet over the helix. Therefore, 
in the intermediate strength of Vh b > a beta sheet could 
form if the deficient energy due to smaller number of HBs 
is compensated by the energy gain of Vdd- 

Similar analysis on the MJ potential shows that instead 
of deciding the secondary structure explicitly, Vmj plays 
a crucial role in making its formation more efficiently. In 
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FIG. 4: Effects of different strengths of VlowIhp on the for- 
mation of the alpha helix (a) and the beta hairpin with C 
terminus (b). The corresponding strengths: solid (black) - 
VLocaiHP, circle (red) - 0.5Vloco.ihp, and triangle (green) - 0. 

early stage of folding, Vmj collapses all residues into a 
compact space. Only when the collapsing happens, inter- 
actions of shorter ranges could function. If the initial col- 
lapsing does not go in the right direction or happens too 
fast, the final protein structure may become disordered. 
After the initial collapsing, the potentials Vdd, Vl co,ihp, 
and Vhb start to dominate. At this point, an obvious 
question remains to be addressed: Since both Vdd and 
Vhb are sequence independent, then for a given sequence, 
what determines that it should fold into a helix or a beta 
sheet? This is where VlocoIHP comes into play because 
it forces successive neighboring side-chains to be either 
on the same side or on the opposite side of the back- 
bone according to their hydrophobicity. Thus different 
sequences result in different local spatial arrangements 
of side-chains, and only when the arrangement is correct, 
the protein can be compacted into the correct secondary 
structure. Finally, our analysis shows that even though 
the native state is still the ground state in the absence 
of Va , incorrect strength of Va would result in itinerant 
motion of the secondary structures. Therefore, Va is pri- 
marily responsible for stabilizing the tertiary structure. 

In conclusion, an effective potential that can fold pro- 
teins without being biased to the native state is con- 
structed and tested. All testing peptides can fold to their 
native states in acceptable computing time. By system- 
atically tuning relative strengths of interactions in the 
potential, the dipole-dipole interaction Vdd and the lo- 
cal hydrophobic interaction VlocuIHP are shown to be 
as crucial as the hydrogen bonding Vhb- While Vhb 
prefers the helix structures, Vdd tends to form sheet-like 
structures. Only when a subtle balance between these 
two interactions holds, the helix and sheet structures can 



co-exist. The sequence- dependent potential Vloco,ihp is 
then responsible for the final selection of either a helix or 
a beta sheet forming. 
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